HumanFirst Evaluation

This notebook conducts a small research on the performance and usefulneess of a data labeling NLU tool called HumanFirst. 88,015 datapoints were uploaded to HumanFirst, then the tool is used to label 1,782 datapoints. While doing so, all information such as the time, the number of datapoints, the hardness of each single action operating HumanFirst was gathered and put in hf-evaluation-data.csv file. We will have a deeper look into the data and build a neural network machine learning model to predict the time required for labeling thousands of datapoints with HumanFirst.

Labeled Total Datapoints
1,782 88,015
  • Note that this is just an approach and the actual time for your cases might be different. In general, HumanFirst is a really good tool.
  • The term datapoint here is being used to describe the same thing as text sentences.

The Data

hf-evaluation-data.csv

Functionality Operations

Successful Clustering Rate

100%

The rate of successfully clustering data is `100%` - meaning that we are *successful* everytime, when we do any operational actions to group data into different intents, or to find similar data points, or to disambiguate intents. This indicates that our operational actions using HumanFirst will most likely help us to build/improve our `NLU` model all the time.

Looking at the above statistics, we can see that

  • On average, each time HumanFirst help us to classify 24 sample sentences. Among those 24 sentences, 23.95 are generally related to each other (have similar meaning) while 0.05 are unrelated. If we more strictly consider the meaning of those 23.95 generally-related sample sentences, 22.95 are truely related to each other and can form a cluster (intent) of sentences, while only 0.95 are out of truely related. Roughly speaking, on average, each time HumanFirst suggests 24 text sentences, 22 or 23 are truely OK to assign to a cluster, 1 or 2 are not OK to assign to the cluster of the major 22 23.
  • We can label up to 142 sentences each time (this number might differ based on how we use the tool).
  • The actual number of sample sentences suggested for clustering are usually in between 6 to 35 with 12 is the medium. </font>